The aim of these exercises is to …
For each of the examples below, you should follow the sequence we’ve used previously:
What is the biological question?
Is the predictor continuous or categorical?
Write out the linear model corresponding to this question.
What distribution do you expect the response variable to follow?
What are the assumptions behind the statistical model you’ll fit?
Fit the model
How will you assess whether the model fits well?
Can you detect an effect of the predictor?
How do you measure the effect?
What do you conclude (including any cautions)
Li et al. (2015) investigated ways of revegetating damaged freshwater enviroments with vegetation. They used a sweet of 4 plant species, and focused on the combined effects of propagule pressure (i.e., the initial seeding event) and water depth on the aquatic vegetation that developed. They reported data for each plant species separately and considered overall plant “community”, putting all species together.
The four plant species can propagate clonally, by shoot regeneration, and they used shoot fragments to “seed” experimental pots. Each pot had 1, 2, or 4 fragments of each of the four species (i.e. there were three propagule pressures). The pots, filled with natural lake sediment, were placed in larger plastic tanks, which were filled to a depth of 30 or 70 cm. There were 5 tanks for each depth. The tanks were outside in full sun, and plants were allowed to grow for 7 weeks, after which time plants were harvested and measured.
For each species in a pot, five shoots were selected and their length measured, the number of nodes (indicating clonal growth) counted, and dry biomass calculated. The total biomass for that species was also measured and used to estimate total nodes and total shoot length. Values for each species were then used to calulate total biomass, total shoot length, and total number of nodes for that pot.
Their data can be found in Table S1 of the paper, but we’ve extracted the data and renamed some of the columns to make things clearer. Our file is here.
df <-read.csv("data/li.csv")
set_sum_contrasts()
setting contr.sum globally: options(contrasts=c('contr.sum', 'contr.poly'))
head(df, 10)
df$depth<-as.factor(df$depth)
df$tank<-as.factor(df$tank)
Remember to go through the steps at the start of these exercises to make sure you have identified the design and assumptions clearly
#Can use aov or linear mixed models to do this
nodes.lmer <-lmer(nodes~depth + pressure + depth*pressure + (1|tank), REML=TRUE, df)
plot(nodes.lmer)
summary(nodes.lmer, ddf="Kenward-Roger")
Linear mixed model fit by REML. t-tests use Kenward-Roger's method ['lmerModLmerTest']
Formula: nodes ~ depth + pressure + depth * pressure + (1 | tank)
Data: df
REML criterion at convergence: 354.1
Scaled residuals:
Min 1Q Median 3Q Max
-1.40672 -0.66015 -0.04919 0.46002 1.65145
Random effects:
Groups Name Variance Std.Dev.
tank (Intercept) 81944 286.3
Residual 35085 187.3
Number of obs: 30, groups: tank, 10
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 884.15 96.77 8.00 9.137 1.66e-05 ***
depth1 313.09 96.77 8.00 3.236 0.012 *
pressure1 329.60 48.36 16.00 6.815 4.15e-06 ***
pressure2 -288.98 48.36 16.00 -5.975 1.94e-05 ***
depth1:pressure1 70.48 48.36 16.00 1.457 0.164
depth1:pressure2 -36.08 48.36 16.00 -0.746 0.467
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) depth1 prssr1 prssr2 dpt1:1
depth1 0.000
pressure1 0.000 0.000
pressure2 0.000 0.000 -0.500
dpth1:prss1 0.000 0.000 0.000 0.000
dpth1:prss2 0.000 0.000 0.000 0.000 -0.500
anova (nodes.lmer, ddf="Kenward-Roger")
Type III Analysis of Variance Table with Kenward-Roger's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
depth 367295 367295 1 8 10.4687 0.01196 *
pressure 1937927 968963 2 16 27.6176 6.477e-06 ***
depth:pressure 74527 37263 2 16 1.0621 0.36889
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
ggplot(data = df, aes(x=factor(pressure, level = c("low", "medium", "high")), y=nodes, group=depth, colour=depth))+
geom_line(stat = "summary") +
geom_point(stat = "summary")+
theme_minimal()
Number of nodes shows positive effects of shallow water and increasing propagule pressure, with both factors having similar, but independent effects.
#Can use aov or linear mixed models to do this
biomass.lmer <-lmer(biomass~depth + pressure + depth*pressure + (1|tank), REML=TRUE, df)
plot(biomass.lmer)
summary(biomass.lmer, ddf="Kenward-Roger")
Linear mixed model fit by REML. t-tests use Kenward-Roger's method ['lmerModLmerTest']
Formula: biomass ~ depth + pressure + depth * pressure + (1 | tank)
Data: df
REML criterion at convergence: 86
Scaled residuals:
Min 1Q Median 3Q Max
-2.14121 -0.39993 -0.03638 0.47193 1.55739
Random effects:
Groups Name Variance Std.Dev.
tank (Intercept) 1.6417 1.2813
Residual 0.4231 0.6505
Number of obs: 30, groups: tank, 10
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 2.2914 0.4222 8.0000 5.427 0.000626 ***
depth1 1.0884 0.4222 8.0000 2.578 0.032732 *
pressure1 1.1995 0.1679 16.0000 7.142 2.34e-06 ***
pressure2 -0.9422 0.1679 16.0000 -5.610 3.91e-05 ***
depth1:pressure1 0.5023 0.1679 16.0000 2.991 0.008639 **
depth1:pressure2 -0.2588 0.1679 16.0000 -1.541 0.142915
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) depth1 prssr1 prssr2 dpt1:1
depth1 0.000
pressure1 0.000 0.000
pressure2 0.000 0.000 -0.500
dpth1:prss1 0.000 0.000 0.000 0.000
dpth1:prss2 0.000 0.000 0.000 0.000 -0.500
anova (biomass.lmer, ddf="Kenward-Roger")
Type III Analysis of Variance Table with Kenward-Roger's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
depth 2.8113 2.8113 1 8 6.6447 0.03273 *
pressure 23.9280 11.9640 2 16 28.2778 5.592e-06 ***
depth:pressure 3.7862 1.8931 2 16 4.4745 0.02861 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
ggplot(data = df, aes(x=factor(pressure, level = c("low", "medium", "high")), y=biomass, group=depth, colour=depth))+
geom_line(stat = "summary") +
geom_point(stat = "summary")+
theme_minimal()
For biomass, propagule pressure effects vary with depth. Not a large change, but overall increase in biomass with pressure, more biomass in shallow water, but biomass in shallow water at high propagule pressure is much higher than expected.
#Can use aov or linear mixed models to do this
slength.lmer <-lmer(slength~depth + pressure + depth*pressure + (1|tank), REML=TRUE, df)
plot(slength.lmer)
summary(slength.lmer, ddf="Kenward-Roger")
Linear mixed model fit by REML. t-tests use Kenward-Roger's method ['lmerModLmerTest']
Formula: slength ~ depth + pressure + depth * pressure + (1 | tank)
Data: df
REML criterion at convergence: 148.8
Scaled residuals:
Min 1Q Median 3Q Max
-1.57618 -0.49869 0.09962 0.38553 2.16020
Random effects:
Groups Name Variance Std.Dev.
tank (Intercept) 18.960 4.354
Residual 6.264 2.503
Number of obs: 30, groups: tank, 10
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 10.82310 1.45081 8.00000 7.460 7.20e-05 ***
depth1 3.44348 1.45081 8.00000 2.373 0.045 *
pressure1 4.59470 0.64621 16.00000 7.110 2.47e-06 ***
pressure2 -3.87776 0.64621 16.00000 -6.001 1.85e-05 ***
depth1:pressure1 0.28223 0.64621 16.00000 0.437 0.668
depth1:pressure2 -0.05518 0.64621 16.00000 -0.085 0.933
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) depth1 prssr1 prssr2 dpt1:1
depth1 0.000
pressure1 0.000 0.000
pressure2 0.000 0.000 -0.500
dpth1:prss1 0.000 0.000 0.000 0.000
dpth1:prss2 0.000 0.000 0.000 0.000 -0.500
anova (slength.lmer, ddf="Kenward-Roger")
Type III Analysis of Variance Table with Kenward-Roger's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
depth 35.29 35.287 1 8 5.6335 0.045 *
pressure 366.62 183.311 2 16 29.2652 4.511e-06 ***
depth:pressure 1.34 0.671 2 16 0.1072 0.899
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
ggplot(data = df, aes(x=factor(pressure, level = c("low", "medium", "high")), y=slength, group=depth, colour=depth))+
geom_line(stat = "summary") +
geom_point(stat = "summary")+
theme_minimal()
Again, clear effects of propagule pressure and depth. Four-fold difference in propagule pressure has a bit more effect than depth does.
Seitz, Lipcius, and Hines (2016) studied the estuarine clam Macoma balthica and set up a field experiment to examine the effects of two different sediment types (shallow-mud with high food availability and muddy-sand with low food availability) and predation by fish and crabs on the growth rate of clams in upper Chesapeake Bay. They used a split-plot design where there were eight shallow-mud and four muddy-sand sites, with more shallow-mud sites because of greater variability; sediment type was the between-plots fixed factor with random sites (“plots”) nested in each type. Within each site, there was one sub-plot (0.25m2) that excluded predators with a cage and a second sub-plot that had no cage; predation was the within-plots fixed factor. Ten individually marked and measured clams were transplanted into each sub-plot and retrieved 20-22 days later to record growth.
The data are available from dryad. The data for this example are in GrowSplitPlot.xlsx, but it needs quite a bit of work for us to work with it in R. Our tidied version is here
Load the data and have a quick look. The variable names should be clear
df <- read.csv("data/seitz.csv")
df$Plot <-as.factor(df$Plot)
head(df,10)
set_sum_contrasts()
setting contr.sum globally: options(contrasts=c('contr.sum', 'contr.poly'))
boxplot(Growth~Habitat*Predation, df)
Some hint of skew and associated differences in variance, but not dramatic
**Examine this effect using a linear mixed model approach and REML
seitz.lmer1 <- lmer(Growth~Habitat + Predation + Habitat*Predation + (1|Plot)+(1|Plot:Predation), REML=TRUE, df)
plot(seitz.lmer1)
#Nothing untoward in residual plot, though could also be interpreted as a wedge shape, with greater spread for the 2 or 3 fitted values at the right
summary(seitz.lmer1, ddf="Kenward-Roger")
Linear mixed model fit by REML. t-tests use Kenward-Roger's method ['lmerModLmerTest']
Formula: Growth ~ Habitat + Predation + Habitat * Predation + (1 | Plot) +
(1 | Plot:Predation)
Data: df
REML criterion at convergence: 140.3
Scaled residuals:
Min 1Q Median 3Q Max
-2.6317 -0.4748 -0.0702 0.3421 3.3436
Random effects:
Groups Name Variance Std.Dev.
Plot:Predation (Intercept) 0.02572 0.1604
Plot (Intercept) 0.15217 0.3901
Residual 0.12198 0.3493
Number of obs: 127, groups: Plot:Predation, 21; Plot, 11
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 0.234265 0.132915 9.125573 1.763 0.1114
Habitat1 0.282094 0.132915 9.125573 2.122 0.0624 .
Predation1 0.057716 0.052173 9.172918 1.106 0.2968
Habitat1:Predation1 0.004061 0.052173 9.172918 0.078 0.9396
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) Habtt1 Prdtn1
Habitat1 -0.281
Predation1 0.046 -0.073
Hbtt1:Prdt1 -0.073 0.046 -0.321
anova (seitz.lmer1, ddf="Kenward-Roger")
Type III Analysis of Variance Table with Kenward-Roger's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
Habitat 0.54944 0.54944 1 9.1256 4.5044 0.06238 .
Predation 0.14928 0.14928 1 9.1729 1.2238 0.29678
Habitat:Predation 0.00074 0.00074 1 9.1729 0.0061 0.93963
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# Get VCs and CIs
seitz.ci <- confint.merMod(seitz.lmer1,oldNames=FALSE)
Computing profile confidence intervals ...
seitz.vc <- (seitz.ci)^2
print(seitz.vc)
2.5 % 97.5 %
sd_(Intercept)|Plot:Predation 0.0000000000 0.08298225
sd_(Intercept)|Plot 0.0444074512 0.36139793
sigma 0.0941305329 0.16060030
(Intercept) 0.0004975448 0.24259155
Habitat1 0.0006489555 0.29193717
Predation1 0.0017822688 0.02521683
Habitat1:Predation1 0.0091121702 0.01116806
If we follow the example of Box 11.3, there is some variation in opinion as to whether we’d try a slightly simpler model, omitting the Predation x Plot interaction, which doesn’t account for much variation.
**What do you think of this option, and would it make a difference?
# Use ML to compare models
seitz.lmer1a <- lmer(Growth~Habitat + Predation + Habitat*Predation + (1|Plot)+(1|Plot:Predation), REML=FALSE, df)
seitz.lmer2 <- lmer(Growth~Habitat + Predation + Habitat*Predation + (1|Plot), REML=TRUE, df)
anova(seitz.lmer1a, seitz.lmer2)
refitting model(s) with ML (instead of REML)
Data: df
Models:
seitz.lmer2: Growth ~ Habitat + Predation + Habitat * Predation + (1 | Plot)
seitz.lmer1a: Growth ~ Habitat + Predation + Habitat * Predation + (1 | Plot) + (1 | Plot:Predation)
npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
seitz.lmer2 6 141.64 158.71 -64.822 129.64
seitz.lmer1a 7 141.13 161.04 -63.568 127.14 2.5081 1 0.1133
AICc(seitz.lmer1a, seitz.lmer2)
#The simpler model fits reasonably well - no major loss of fit from omitting that term
#Use simpler model
Simpler model fits OK, so look at it’s output (fitting with REML)
seitz.lmer3 <- lmer(Growth~Habitat + Predation + Habitat*Predation + (1|Plot), REML=TRUE, df)
summary(seitz.lmer3, ddf="Kenward-Roger")
Linear mixed model fit by REML. t-tests use Kenward-Roger's method ['lmerModLmerTest']
Formula: Growth ~ Habitat + Predation + Habitat * Predation + (1 | Plot)
Data: df
REML criterion at convergence: 143.9
Scaled residuals:
Min 1Q Median 3Q Max
-2.4027 -0.5340 -0.0890 0.3914 3.2136
Random effects:
Groups Name Variance Std.Dev.
Plot (Intercept) 0.1668 0.4084
Residual 0.1328 0.3645
Number of obs: 127, groups: Plot, 11
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 2.366e-01 1.336e-01 9.226e+00 1.772 0.1094
Habitat1 2.840e-01 1.336e-01 9.226e+00 2.126 0.0616 .
Predation1 5.747e-02 3.796e-02 1.150e+02 1.514 0.1328
Habitat1:Predation1 1.277e-03 3.796e-02 1.150e+02 0.034 0.9732
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) Habtt1 Prdtn1
Habitat1 -0.286
Predation1 0.076 -0.089
Hbtt1:Prdt1 -0.089 0.076 -0.415
anova (seitz.lmer3, ddf="Kenward-Roger")
Type III Analysis of Variance Table with Kenward-Roger's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
Habitat 0.60064 0.60064 1 9.226 4.5217 0.06164 .
Predation 0.30446 0.30446 1 115.033 2.2919 0.13279
Habitat:Predation 0.00015 0.00015 1 115.033 0.0011 0.97322
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The simpler model makes little difference to our conclusion - no hint of a detectable Hab x Pred interaction
Simpler is better when explaining your analysis to an audience!
One simple step is to recognize that the individual growth measurements aren’t independent experimental units, but from animals within the same Predation treatment unit. We could replace the individual observations with the mean of animals within each plot. We’d get a slightly simpler analysis, but not markedly so.
The bigger step would be to generate means, as above, and then calculate a measure of predation for each plot, as the difference in growth between Exclusion and Uncaged areas. We then have a single difference value for each plot. Our question of whether predation varies with habitat could then be addressed with a t-test between habitats or calculating confidence intervals around the means for each habitat type.
df2 <- df %>%
group_by (Habitat, Predation, Plot) %>%
dplyr::summarise (Growth = mean(Growth)) %>%
as.data.frame()
`summarise()` has grouped output by 'Habitat', 'Predation'. You can override using the `.groups` argument.
head(df2,10)
boxplot(Growth~Habitat*Predation, df2)
seitz.lmer4 <- lmer(Growth~Habitat + Predation + Habitat*Predation + (1|Plot), REML=TRUE, df2)
plot(seitz.lmer4)
#At this point, wedge-shaped pattern of residuals is a bit more conspicuous, so think about whether a transform might be useful
Lgrowth<-log(df2$Growth + 1)
summary(seitz.lmer4, ddf="Kenward-Roger")
Linear mixed model fit by REML. t-tests use Kenward-Roger's method ['lmerModLmerTest']
Formula: Growth ~ Habitat + Predation + Habitat * Predation + (1 | Plot)
Data: df2
REML criterion at convergence: 24.4
Scaled residuals:
Min 1Q Median 3Q Max
-1.56607 -0.33662 0.02525 0.20493 1.38215
Random effects:
Groups Name Variance Std.Dev.
Plot (Intercept) 0.15159 0.3893
Residual 0.04039 0.2010
Number of obs: 21, groups: Plot, 11
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 0.233769 0.130340 8.901071 1.794 0.1068
Habitat1 0.281775 0.130340 8.901071 2.162 0.0592 .
Predation1 0.056299 0.045832 8.098174 1.228 0.2538
Habitat1:Predation1 0.005793 0.045832 8.098174 0.126 0.9025
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) Habtt1 Prdtn1
Habitat1 -0.265
Predation1 -0.018 -0.018
Hbtt1:Prdt1 -0.018 -0.018 -0.207
anova (seitz.lmer4, ddf="Kenward-Roger")
Type III Analysis of Variance Table with Kenward-Roger's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
Habitat 0.188745 0.188745 1 8.9011 4.6736 0.05922 .
Predation 0.060940 0.060940 1 8.0982 1.5089 0.25380
Habitat:Predation 0.000645 0.000645 1 8.0982 0.0160 0.90249
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
seitz.lmer5 <- lmer(Lgrowth~Habitat + Predation + Habitat*Predation + (1|Plot), REML=TRUE, df2)
plot(seitz.lmer5)
#Residuals don't look much better. Leave as is
If students attempt this, it’s two steps - run summary statistics, e.g. using dplyr::summarise, then use tidyr::pivot_wider to create a data frame in which Exclusion and Uncaged values are different variables. Predation <-Uncaged - Exclusion.
df3<-pivot_wider(df2,
names_from = Predation,
values_from = Growth
)
df3$Predation <-df3$Exclusion - df3$Uncaged
head(df3,10)
boxplot(Predation~Habitat, df3)
t.test(Predation~Habitat, data=df3, var.equal = FALSE)
Welch Two Sample t-test
data: Predation by Habitat
t = 0.088487, df = 5.8778, p-value = 0.9324
alternative hypothesis: true difference in means between group Mud and group Sand is not equal to 0
95 percent confidence interval:
-0.3583016 0.3850476
sample estimates:
mean in group Mud mean in group Sand
0.1143849 0.1010119
Dalziel et al. (2015) were interested in disentangling the processes of acclimation and adaptation to changing environments, and examined levels of phenotypic plasticity in a small fish, the lake whitefish. This fish has developed two ecotypes - phenotypes that feed in different ways. Dwarf ecotypes are actively swimming and feeding, but slow-growing, while the ancestral normal phenotype is more sedentary. It’s thought that these ecotypes evolved separately under past climates, but then came together and can co-occur. There are anatomical and physiological differences between the ecotypes that are linked to swimming ability, and Dalziel and her colleagues were interested in how these ecotypes responed to a changed environment with faster-flowing water, particularly any role of phenotypic plasticity.
Phenotypic plasticity is best assessed by placing different phenotypes into the same set of varied environments and measuring their performance. They did this for whitefish by using roughly size-matched, laboratory-fertilized and reared fish from each ecotype, which were then placed in two environments, still water and fast-flowing water. Fish were kept in these environments for approximately 3 months, then sampled.
Flow environments were produced in 1 m high circular tanks, .6 m in diameter, with a .16 m diameter cylinder in the centre of the tank, creating a circular swimming “race track”. Within this track, water flowed at 7.6 cm/s for 6 h/day or 0.5 cm/s. Each tank contained 8 fish of each ecotype.
At the end of the experiment, fish were sacrficed, and the researchers took hematocrit samples, measured heart morphology and took samples of red and white muscle. Red mussel samples were used for enzyme assays and histology, and white muscle to obtain mitochondria, whose performance was assessed.
They provided data and R code through dryad. We’ll focus on their data focusing on mitochondrial function in white muscle. These data are in their file Fig5_MitoRespiration.csv. If you want to explore some of their other analyses, you can use their R script, which allows easy selection of data files.
plasticity<-read.csv("data/Fig5_MitoRespiration.csv",header=TRUE,stringsAsFactors = FALSE, na.strings = c("NA",""))
head(plasticity, 10)
Look at data measuring fluxes of four oxidative complexes, I-IV. These are the variables flux1.4.mg, flux2.4mg, and flux.cox.mg
flux1.4.mg is total flux across the four complexes, flux2.4.mg is flux across complexes II-IV, and flux.cox.mg is flux across complex IV
#Make sure categorical predictors are treated the right way
plasticity$treatment <-as.factor(plasticity$treatment)
plasticity$ecotype <-as.factor(plasticity$ecotype)
plasticity$tank <-as.factor(plasticity$tank)
Students should check that the fluxes are independent
scatterplotMatrix(data = plasticity, ~ flux1.4.mg + flux2.4.mg + flux.cox.mg)
Be a bit cautious about interpreting results for all 3
plasticity.lmer <- lmer(flux1.4.mg ~ ecotype*treatment + (1|tank), REML=TRUE, plasticity)
boundary (singular) fit: see help('isSingular')
anova(plasticity.lmer)
Type III Analysis of Variance Table with Satterthwaite's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
ecotype 54.5 54.5 1 40 0.0794 0.77962
treatment 50.5 50.5 1 40 0.0735 0.78772
ecotype:treatment 4756.6 4756.6 1 40 6.9221 0.01203 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
ggplot(data=subset(plasticity, !is.na(flux1.4.mg)), aes(x = treatment, y = flux1.4.mg, group=ecotype, colour = ecotype))+
geom_line(stat = "summary")+
geom_point(stat = "summary")+
theme_minimal()+
ylim(0,NA)
plasticity.lmer <- lmer(flux2.4.mg ~ ecotype*treatment + (1|tank), REML=TRUE, plasticity)
boundary (singular) fit: see help('isSingular')
anova(plasticity.lmer)
Type III Analysis of Variance Table with Satterthwaite's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
ecotype 317.87 317.87 1 39 3.8894 0.05571 .
treatment 19.42 19.42 1 39 0.2376 0.62869
ecotype:treatment 66.40 66.40 1 39 0.8124 0.37294
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
ggplot(data=subset(plasticity, !is.na(flux2.4.mg)), aes(x = treatment, y = flux2.4.mg, group=ecotype, colour = ecotype))+
geom_line(stat = "summary")+
geom_point(stat = "summary")+
theme_minimal()+
ylim(0,NA)
plasticity.lmer <- lmer(flux.cox.mg ~ ecotype*treatment + (1|tank), REML=TRUE, plasticity)
anova(plasticity.lmer)
Type III Analysis of Variance Table with Satterthwaite's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
ecotype 1903.1 1903.1 1 40.964 0.4963 0.485105
treatment 720.3 720.3 1 5.698 0.1879 0.680628
ecotype:treatment 29534.2 29534.2 1 40.964 7.7023 0.008273 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
ggplot(data=subset(plasticity, !is.na(flux.cox.mg)), aes(x = treatment, y = flux.cox.mg, group=ecotype, colour = ecotype))+
geom_line(stat = "summary")+
geom_point(stat = "summary")+
theme_minimal()+
ylim(0,NA)
The analysis suggests a detectable interaction between ecotype and treatment for Complex I-IV and for IV, but not for II-IV.
Simplest way to investigate is to look at interaction plots. For CI-IV, it looks like both ecotypes show plasticity, but in the opposite direction. Fluxes rise for dwarf ecotype under swimming, but fall for the normal ecotype. That pattern is also seen for the Complex IV alone. For II-IV, the patterns aren’t distinguishable from noise under the anova criteria we’ve used, but patterns of means are similar.
Morrissette-Boileau et al. (2018) examined factors affecting a shrub (dwarf birch) that is potentially expanding its range northwards with warming. They were interested in modifying factors - enhanced nutrients, warming, and browsing by caribou.
Caribou were not manipulated directly, but were excluded by fencing them out, and their foraging simulated by manual removal of 0, 25, or 75% of available shoots once each spring.
Nutrients were manipulated in 4 x 24 m plots, half of which received a nutrient supplement at bud-burst each year.
Each 4 x 24 area was divided into 4 x 4 m sub-plots. Each of these sub-plots had a 1 m x 1m area that was the subject of a combination of warming and caribou treatments.
Global warming was simulated using open-top hexagonal chambers that trap solar energy. They were used to establish two levels, ambient and warmed. Warming was applied to the target 1 x 1 m area thoughout the growing season.
The experiment ran for 5 years, after which time the authors recorded the total leafy biomass from each birch stem in the 1 x 1 target area. This time was the start of the 2014 growing season.
Their data are available from dryad. The relevant files are inside_productivity.csv and a very clear README_for_inside_productivity.txt file. We won’t explain the variables, and just point you to the readme. Figure 1 of the paper also has a nice diagram of the experiment.
Start by taking a quick look at the data file. The variables you may be working with for now are block, fertilization, temp, browsing, and biomass. Note that this file has a semicolon rather than comma as the delimiter, so make sure you take account of that when importing the data
df <- read_delim("data/inside_productivity.csv",
delim = ";", escape_double = FALSE, trim_ws = TRUE)
Rows: 60 Columns: 8── Column specification ──────────────────────────────────────────────────────────────────────
Delimiter: ";"
chr (4): Block, Fertilization, Temp, Parcelle_id
dbl (4): Biomass, Browsing, Biomass_beg2009, Biomass_end2009
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
set_sum_contrasts()
setting contr.sum globally: options(contrasts=c('contr.sum', 'contr.poly'))
head(df, 10)
df$Browsing <- as.factor(df$Browsing)
df$Block <- as.factor(df$Block)
df$Temp <- as.factor(df$Temp)
df$Fertilization <-as.factor(df$Fertilization)
Make sure you think carefully about scales of experimental units, which factors are between- and which are within-plots, and which are fixed vs random.
4 factors - fert, warming, browsing, block nutrients, block(nutrients), then fert x warming within each block. blocks is a random effect; all others are fixed
Make sure you include initial checking of assumptions
df.aov <- aov(Biomass ~ Fertilization*Temp*Browsing, data=df)
plot(df.aov)
Quick ‘n’ dirty look at residuals (omitting blocks) suggests some skew; try log transform
df2.aov <- aov(log(Biomass) ~ Fertilization*Temp*Browsing, data=df)
# Several zero biomass values that could possibly be removed - shoots may be dead
plot(df2.aov)
Logged values better behaved, though raw data not too bad
df$lbiomass <- log(df$Biomass)
We could fit the model a couple of ways, as shown on Box 11.5
df3.aov <- aov(lbiomass ~ Fertilization*Temp*Browsing + Error(Block/(Temp*Browsing)), df)
summary(df3.aov)
Error: Block
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 4 1.302 0.3255
Error: Block:Temp
Df Sum Sq Mean Sq F value Pr(>F)
Temp 1 0.6969 0.6969 5.279 0.0832 .
Residuals 4 0.5281 0.1320
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Error: Block:Browsing
Df Sum Sq Mean Sq F value Pr(>F)
Browsing 2 0.4176 0.2088 0.685 0.531
Residuals 8 2.4386 0.3048
Error: Block:Temp:Browsing
Df Sum Sq Mean Sq F value Pr(>F)
Temp:Browsing 2 0.2804 0.1402 0.773 0.493
Residuals 8 1.4502 0.1813
Error: Within
Df Sum Sq Mean Sq F value Pr(>F)
Fertilization 1 0.086 0.08588 0.494 0.489
Fertilization:Temp 1 0.187 0.18694 1.076 0.310
Fertilization:Browsing 2 0.621 0.31063 1.787 0.189
Fertilization:Temp:Browsing 2 0.062 0.03086 0.178 0.838
Residuals 24 4.171 0.17379
Also run as linear mixed effect model using lmer or lme4
df.lmer <- lmer(lbiomass ~ Fertilization*Temp*Browsing + (1|Block) + (1|Block:Temp) + (1|Block:Browsing), REML=TRUE, df)
boundary (singular) fit: see help('isSingular')
plot(df.lmer)
summary(df.lmer, ddf="Kenward-Roger")
Linear mixed model fit by REML. t-tests use Kenward-Roger's method ['lmerModLmerTest']
Formula: lbiomass ~ Fertilization * Temp * Browsing + (1 | Block) + (1 |
Block:Temp) + (1 | Block:Browsing)
Data: df
REML criterion at convergence: 103.3
Scaled residuals:
Min 1Q Median 3Q Max
-2.31221 -0.63757 -0.02648 0.72355 2.00599
Random effects:
Groups Name Variance Std.Dev.
Block:Browsing (Intercept) 3.350e-02 1.830e-01
Block:Temp (Intercept) 1.982e-11 4.452e-06
Block (Intercept) 1.722e-03 4.150e-02
Residual 1.708e-01 4.133e-01
Number of obs: 60, groups: Block:Browsing, 15; Block:Temp, 10; Block, 5
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 6.452811 0.073653 4.000000 87.610 1.02e-07 ***
Fertilization1 0.037833 0.053356 32.000000 0.709 0.483
Temp1 -0.107773 0.053356 4.000000 -2.020 0.114
Browsing1 0.097999 0.100800 8.000000 0.972 0.359
Browsing2 0.007905 0.100800 8.000000 0.078 0.939
Fertilization1:Temp1 -0.055819 0.053356 32.000000 -1.046 0.303
Fertilization1:Browsing1 0.033867 0.075457 32.000000 0.449 0.657
Fertilization1:Browsing2 0.104192 0.075457 32.000000 1.381 0.177
Temp1:Browsing1 -0.096623 0.075457 32.000000 -1.281 0.210
Temp1:Browsing2 0.045436 0.075457 32.000000 0.602 0.551
Fertilization1:Temp1:Browsing1 0.027301 0.075457 32.000000 0.362 0.720
Fertilization1:Temp1:Browsing2 0.017718 0.075457 32.000000 0.235 0.816
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) Frtlz1 Temp1 Brwsn1 Brwsn2 Fr1:T1 Fr1:B1 Fr1:B2 Tm1:B1 Tm1:B2 F1:T1:B1
Fertiliztn1 0.000
Temp1 0.000 0.000
Browsing1 0.000 0.000 0.000
Browsing2 0.000 0.000 0.000 -0.500
Frtlztn1:T1 0.000 0.000 0.000 0.000 0.000
Frtlztn1:B1 0.000 0.000 0.000 0.000 0.000 0.000
Frtlztn1:B2 0.000 0.000 0.000 0.000 0.000 0.000 -0.500
Tmp1:Brwsn1 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Tmp1:Brwsn2 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 -0.500
Frtl1:T1:B1 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Frtl1:T1:B2 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 -0.500
optimizer (nloptwrap) convergence code: 0 (OK)
boundary (singular) fit: see help('isSingular')
anova(df.lmer, ddf="Kenward-Roger")
Type III Analysis of Variance Table with Kenward-Roger's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
Fertilization 0.08588 0.08588 1 32 0.5028 0.4834
Temp 0.69690 0.69690 1 4 4.0799 0.1135
Browsing 0.23403 0.11702 2 8 0.6851 0.5313
Fertilization:Temp 0.18694 0.18694 1 32 1.0944 0.3033
Fertilization:Browsing 0.62127 0.31063 2 32 1.8186 0.1786
Temp:Browsing 0.28041 0.14021 2 32 0.8208 0.4491
Fertilization:Temp:Browsing 0.06172 0.03086 2 32 0.1807 0.8356
df2.lmer <- lmer(lbiomass ~ Fertilization*Temp*Browsing + (1|Block) + (1|Block:Temp) + (1|Block:Browsing) + Biomass_end2009, REML=TRUE, df)
boundary (singular) fit: see help('isSingular')
summary(df2.lmer, ddf="Kenward-Roger")
Linear mixed model fit by REML. t-tests use Kenward-Roger's method ['lmerModLmerTest']
Formula: lbiomass ~ Fertilization * Temp * Browsing + (1 | Block) + (1 |
Block:Temp) + (1 | Block:Browsing) + Biomass_end2009
Data: df
REML criterion at convergence: 102.7
Scaled residuals:
Min 1Q Median 3Q Max
-1.69166 -0.65201 -0.09715 0.58779 1.72815
Random effects:
Groups Name Variance Std.Dev.
Block:Browsing (Intercept) 0.018490 0.13598
Block:Temp (Intercept) 0.000000 0.00000
Block (Intercept) 0.000823 0.02869
Residual 0.148825 0.38578
Number of obs: 60, groups: Block:Browsing, 15; Block:Temp, 10; Block, 5
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 5.774061 0.236503 40.259856 24.414 <2e-16 ***
Fertilization1 0.015193 0.050382 31.414004 0.302 0.7650
Temp1 -0.059776 0.052352 4.598212 -1.142 0.3095
Browsing1 0.032343 0.088956 8.462632 0.364 0.7251
Browsing2 -0.010887 0.086406 7.799225 -0.126 0.9029
Biomass_end2009 0.013584 0.004566 44.990286 2.975 0.0047 **
Fertilization1:Temp1 -0.093401 0.051381 31.977265 -1.818 0.0785 .
Fertilization1:Browsing1 0.023906 0.070513 31.111014 0.339 0.7369
Fertilization1:Browsing2 0.066383 0.071571 31.543464 0.928 0.3607
Temp1:Browsing1 -0.084851 0.070544 31.124084 -1.203 0.2381
Temp1:Browsing2 0.044304 0.070434 31.078418 0.629 0.5339
Fertilization1:Temp1:Browsing1 0.015981 0.070536 31.120614 0.227 0.8222
Fertilization1:Temp1:Browsing2 0.046471 0.071093 31.349849 0.654 0.5181
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation matrix not shown by default, as p = 13 > 12.
Use print(x, correlation=TRUE) or
vcov(x) if you need it
optimizer (nloptwrap) convergence code: 0 (OK)
boundary (singular) fit: see help('isSingular')
anova(df2.lmer, ddf="Kenward-Roger")
Type III Analysis of Variance Table with Kenward-Roger's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
Fertilization 0.01353 0.01353 1 31.414 0.0909 0.76497
Temp 0.19403 0.19403 1 4.598 1.3037 0.30948
Browsing 0.02003 0.01001 2 8.357 0.0673 0.93545
Biomass_end2009 1.31712 1.31712 1 44.990 8.8501 0.00470 **
Fertilization:Temp 0.49179 0.49179 1 31.977 3.3045 0.07847 .
Fertilization:Browsing 0.24829 0.12415 2 31.481 0.8342 0.44361
Temp:Browsing 0.21547 0.10773 2 31.106 0.7239 0.49285
Fertilization:Temp:Browsing 0.12480 0.06240 2 31.216 0.4193 0.66115
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#interaction_plot(df3.aov, ~Fertilization:Temp)
We’ve just considered the 4 x 24 m plots in which nutrients were in which nutrients were manipulated, but in the actual experiment, these plots were themselves grouped within 5 caribou fences, each with 12 plots.
You could challenge yourself by thinking about how you’d incorporate this aspect into your model & analysis, including the complications that might arise from adding a second random factor.